Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference
نویسندگان
چکیده
PHASTA falls under the category of high-performance scientific computation codes designed for solving partial differential equations (PDEs). Its a massively parallel unstructured, implicit solver with particular emphasis on fluid dynamics (CFD) applications. More specifically, PHASTA is a parallel, hierarchic, adaptive, stabilized, transient analysis code that effectively employs advanced anisotropic adaptive algorithms and numerical models of flow physics. In this paper, we first describe the parallelization of PHASTA’s core algorithms for an implicit solve, where one of our key assumptions is that on a properly balanced supercomputer with appropriate attributes, PHASTA should continue to strongly scale on high core counts until the computational workload per core becomes insufficient and inter-processor communications start to dominate. We then present and analyze PHASTA’s parallel performance across a variety of current near petascale systems, including IBM BG/L, IBM BG/P, Cray XT3, and custom Opteron based supercluster; this selection of systems with inherently different attributes covers a majority of potential candidates for upcoming petascale systems. On one hand, we achieve near perfect (linear) strong scaling out to 32,768 cores of IBM BG/L; showing that a system with desirable attributes will allow implicit solvers to strongly scale on high core counts (including petascale systems). On the contrary, we find that the relative tipping point for strong scaling fundamentally differs among current supercomputer systems. To understand the loss of scaling observed on a particular system (Opteron based supercluster) we analyze the performance and demonstrate that such a loss can be associated to an unbalance in a system attribute; specifically compute-node operating system (OS). In particular, PHASTA scales well to high core counts (up to 32,768 cores) during an implicit solve on systems with compute nodes using lightweight kernels (for example, IBM BG/L); however, we show that on a system where the compute node OS is more heavy weight (e.g., one with background processes) a loss in strong scaling is observed relatively at much fewer number of cores (4,096 cores).
منابع مشابه
A New Implicit Dissipation Term for Solving 3D Euler Equations on Unstructured Grids by GMRES+LU-SGS Scheme
Due to improvements in computational resources, interest has recently increased in using implicit scheme for solving flow equations on 3D unstructured grids. However, most of the implicit schemes produce greater numerical diffusion error than their corresponding explicit schemes. This stems from the fact that in linearizing implicit fluxes, it is conventional to replace the Jacobian matrix in t...
متن کاملA New Implicit Dissipation Term for Solving 3D Euler Equations on Unstructured Grids by GMRES+LU-SGS Scheme
Due to improvements in computational resources, interest has recently increased in using implicit scheme for solving flow equations on 3D unstructured grids. However, most of the implicit schemes produce greater numerical diffusion error than their corresponding explicit schemes. This stems from the fact that in linearizing implicit fluxes, it is conventional to replace the Jacobian matrix in t...
متن کاملSimulation of Store Separation using Low-cost CFD with Dynamic Meshing
The simulation of the store separation using the automatic coupling of dynamic equations with flow aerodynamics is addressed. The precision and cost (calculation time) were considered as comparators. The method used in the present research decreased the calculation cost while limiting the solution error within a specific and tolerable interval. The methods applied to model the aerodynamic force...
متن کاملThree-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strateg...
متن کاملA new 2D block ordering system for wavelet-based multi-resolution up-scaling
A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scientific Programming
دوره 17 شماره
صفحات -
تاریخ انتشار 2009